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DIGITAL RECORDING AND PLAYBACK SYSTEM 
WITH VOICE RECOGNITION CAPABILITY 
FOR CONCURRENT TEXT GENERATION 

5 

RELATED U.S. APPLICATIONS 

This application is a continuation of and claims priority to U.S. Patent No. 
6754,619 entitled "Digital Recording And Playback System With Voice 
Recognition Capability For Concurrent Text Generation," by Nakatsuvama filed 
10 on 11/15/99, which is incorporated herein by reference. 



BACKGROUND OF THE INVENTION 
Field of the Invention 
1 5 The present invention relates to the design of digital recording and 

playback systems. More specifically, the present invention pertains to the 
processing of voice and concurrent generation of corresponding text in a 
portable digital appliance. 



20 Related Art 

The use of portable digital recording and playback devices are quickly 
gaining popularity in business and among individual users. In particular, one 
attractive feature of digital recording is the possibility of converting the voice 
messages into text, which can then be reviewed, revised and incorporated into 

25 documents or otherwise retrieved for use subsequently. Today, there are 

several models of portable digital recorder in the marketplace. These prior art 
recorders typically record voice messages as compressed digital data. In order 
to convert the compressed digital data to text data, a separate computer program 
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is generally required. Thus, in the prior art, subsequent to a recording session, 
the user needs to post-process the compressed digital data to perform the voice- 
to-text conversion. This requires additional processing time, and in some cases 
even requires the user to transfer the compressed digital data from the portable 
5 device to a personal computer (PC) having the necessary software program 
before the conversion can be performed. It is desirable to eliminate the extra 
step of post-recording conversion from compressed digital data to text data in a 
portable digital recording and playback system. 

10 These prior art devices are not well-suited for generating text data from 

the recorded voice data for an additional reason. In order to achieve good 
conversion from voice to text, a high quality voice input to the voice to text 
conversion engine is needed. In prior art portable systems, the voice data is 
subject to high compression because portable systems typically have limited 

15 memory capacity, and high compression allows more voice data to be stored into 
the limited memory resources. Since voice data is stored in a highly compressed 
format in these portable prior art devices, the text data generated directly from 
the compressed voice data by a conversion program is usually unsatisfactory. 
As such, it is highly advantageous to have a portable digital recording and 

20 playback system which provides high quality conversion from voice to text. 

Furthermore, portable devices are typically battery-powered. Thus, the 
need to conserve power is a major design consideration. As such, while a high 
capacity stager can potentially be used in a large, non-portable device deriving 
25 its power from a power outlet to improve the quality of the conversion from 

compressed voice data to text data, it is not a viable option in a portable device. 
Therefore, there exists a need for a portable digital recording and playback 
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system which provides high quality conversion from voice to text and yet does 
not require a high rate of power consumption. 
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SUMMARY OF THE INVENTION 

In implementing a viable portable digital recording and playback system, it 
is highly desirable that components that are well known in the art and are 
compatible with existing computer systems and other appliances be used so that 
5 the cost of realizing the portable digital recording and playback system is low. By 
so doing, the need to incur costly expenditures for retrofitting existing computer 
systems and other appliances or for building custom components is 
advantageously eliminated. 

10 Thus, a need exists for a portable digital recording and playback system 

which does not require post-recording conversion to generate text data from 
compressed digital data. A further need exists for a portable digital recording 
and playback system which meets the above need and which provides high 
quality conversion from voice to text. Still another need exists for a portable 

15 digital recording and playback system which meets both of the above needs and 
which does not require a high level of power consumption. Yet another need 
exists for a portable digital recording and playback system which meets all of the 
above needs and which is conducive to use with existing computer systems and 
other appliances. 

20 

Accordingly, the present invention provides a portable digital recording 
and playback system which generates text data from voice without requiring 
post-recording conversion from compressed digital data to text data. The 
present invention further provides a portable digital recording and playback 
25 system which not only provides voice to text conversion without post-processing 
but the conversion is also of high quality. Embodiments of the present invention 
perform voice-to-text conversion using the high quality audio input signal rather 
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than highly compressed voice data so that high quality conversion is achieved. 
Moreover, the present invention provides a portable digital recording and 
playback system which includes the above features and which conserves power 
for full battery operation. Furthermore, embodiments of the present invention 
5 utilize components that are well known in the art and are compatible with existing 
computer systems and other appliances, so that the present invention is 
conducive for use with existing computer systems and other appliances. These 
and other advantages of the present invention not specifically mentioned above 
will become clear within discussions of the present invention presented herein. 

10 

More specifically, in one embodiment of the present invention, a digital 
recording and playback system is provided. In this embodiment, the system 
comprises an audio capturing device configured to receive a voice input. The 
system also comprises a high compression encoder (HCE) coupled to the audio 

15 capturing device and configured to generate digital wave data corresponding to 
the voice input. The system further comprises a voice recognition engine (VRE) 
coupled to the audio capturing device and configured to generate text data 
corresponding to the voice input. Moreover, in this embodiment, the HCE and 
VRE are selectively coupled to a memory sub-system which is configured to 

20 store the digital wave data and the text data. In particular, in this embodiment, 
the HCE and the VRE are operable to concurrently generate the digital wave 
data and the text data in response to the voice input such that the digital wave 
data and the text data can be stored in the memory sub-system in a 
synchronized manner. Thus, in this embodiment, the present invention provides 

25 recording capability wherein text data is generated from a voice input without 
requiring post-recording conversion. In a specific embodiment, the present 
invention includes the above and wherein the system is battery-powered. 
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Additional embodiments of the present invention include the above and 
further comprise a decoder selectively coupled to the memory sub-system and 
configured to decode the digital wave data into decoded audio data, a digital-to- 
5 analog (D/A) converter coupled to the decoder and configured to convert the 
decoded audio data into an analog signal, and an audio output device coupled to 
the D/A converter and configured to generate a voice output corresponding to 
the voice input from the analog signal. Moreover, these embodiments also 
comprises a display sub-system selectively coupled to the memory sub-system 
10 and configured to display the text data. Thus, in these embodiments, the 
present invention provides simultaneous voice playback and text display. 



SONY-50N3172 



Confidential 



Marked-Up Substitute Specification 
-7- 

BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and form a part of 
this specification, illustrate embodiments of the invention and, together with the 
5 description, serve to explain the principles of the invention: 

Figure 1 is a block diagram illustrating a portable digital recording and 
playback system 100 in accordance with one embodiment of the present 
invention, wherein the system has built-in voice recognition capability for 
1 0 concurrent text generation during voice recording. 

Figure 2A is a flow diagram illustrating steps for performing recording 
using system 100 of Figure 1 in accordance with one embodiment of the present 
invention. 

15 

Figure 2B is a diagram illustrating one embodiment of arrangement of 
corresponding portions of voice data and text data as stored in a portable digital 
recording and playback system 100 in accordance with the present invention. 

20 Figure 3 is a flow diagram illustrating steps for performing playback using 

system 100 of Figure 1 in accordance with one embodiment of the present 
invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

In the following detailed description of the present invention, a digital 
recording and playback system with voice recognition capability for concurrent 
text generation, numerous specific details are set forth in order to provide a 
5 thorough understanding of the present invention. However, it will be recognized 
by one skilled in the art that the present invention may be practiced without these 
specific details or with equivalents thereof. In other instances, well known 
methods, procedures, components, and circuits have not been described in 
detail as not to unnecessarily obscure aspects of the present invention. 

10 

Exemplary Configuration of a Digital Recording and Playback System 

of the Present Invention 
Figure 1 is a block diagram illustrating a portable digital recording and 
playback system 100 in accordance with one embodiment of the present 
15 invention, wherein the system has built-in voice recognition capability for 
concurrent text generation during voice recording. In system 100, an audio 
capturing device 1 10 is coupled to a high compression encoder (HCE) 120. 
Audio capturing device 110 is also coupled to a voice recognition engine (VRE) 
130. Both HCE 120 and VRE 130 are selectively coupled to a memory sub- 
20 system 140 through an intelligent switch 135. More particularly, switch 135 is 
operable to couple either HCE 120 or VRE 130, but not both, to memory sub- 
system 140 at any given time. In one embodiment, switch 135 is a multiplexer. 
In another embodiment, switch 135 is a software switch for data routing. In an 
exemplary embodiment, audio capturing device 110 comprises a microphone. It 
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is appreciated that audio signals are supplied to HCE 120 and VRE 130 
simultaneously so that voice encoding and recognition functions can be 
performed in parallel. 

5 It is appreciated that within the scope of the present invention, memory 

sub-system 140 can comprise volatile memory (e.g., random access memory 
RAM), non-volatile memory (e.g., read only memory ROM), and/or data storage 
devices such as magnetic or optical disk drives and disks (e.g., diskettes, tapes, 
cartridges) which are computer readable media for storing information and 
10 instructions. These memory modules of memory sub-system 140 can be 
removable to facilitate the easy transfer of data stored therein. In one 
embodiment, memory sub-system 140 comprises semiconductor flash memory. 



Still referring to Figure 1, memory sub-system 140 is selectively coupled 
15 to both a decoder 150 and a display sub-system 180 through an intelligent switch 
145. More particularly, switch 145 is operable to couple memory sub-system 
140 to either decoder 150 or display sub-system 180, but not both, at any given 
time. In one embodiment, switch 145 is a multiplexer. In another embodiment, 
switch 145 is a software switch. In one embodiment, switch 145 is controlled by 
20 the texted voice data generated by VRE 130. Moreover, in an exemplary 

embodiment, display sub-system 180 comprises flat panel display technology, 
for example, a liquid crystal display (LCD). 
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With reference still to Figure 1, decoder 150 is further coupled to a digital- 
to-analog (D/A) converter 160. Moreover, D/A converter 160 is coupled to an 
amplifier 165, which is in turn coupled to an audio output device 170. In one 
embodiment, audio output device 170 comprises a speaker. 

5 

With reference still to Figure 1, in one embodiment, an editing sub-system 
190 is coupled to memory sub-system 140. In this embodiment, editing sub- 
system 190 can include an alphanumeric input device having alphanumeric and 
function keys to allow user editing of the text data. Editing sub-system 190 can 

10 also include a cursor control or directing device to facilitate text editing and 

command selection by a user. Cursor control device allows the computer user to 
dynamically signal the two dimensional movement of a visible symbol (cursor) on 
a screen of display sub-system 1 80. Many implementations of cursor control 
device are known in the art including a trackball, mouse, touch pad, joystick or 

15 special keys on the alphanumeric input device capable of signaling movement of 
a given direction or manner of displacement. Alternatively, it will be appreciated 
that a cursor can be directed and/or activated via input from the alphanumeric 
input device using special keys and key sequence commands. The present 
invention is also well suited to directing a cursor by other means such as, for 

20 example, voice commands. Moreover, editing sub-system 190 can further 
include a printing device for generating paper copies of the text data. 

Operation of a Digital Recording and Playback System of the Present Invention 
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Referring next to Figure 2A, a flow diagram 200 illustrating steps for 
performing recording using system 100 of Figure 1 in accordance with one 
embodiment of the present invention is shown. In step 210, system 100 receives 
a voice input using audio capturing device 110. 

5 

In step 220, system 100 of Figure 1 generates digital wave data from the 
voice input using HCE 120. In an exemplary embodiment, HCE 120 of system 
100 can achieve a compression fatio rate of two kilobits per second (2 kbit/s). It 
is appreciated that the high level of compression of the digital wave data in 
10 accordance with the present invention advantageously reduces the amount of 
memory that is required to store the digital wave data. 



In step 230, system 100 of Figure 1 generates text data from the voice 
input using VRE 130. In one embodiment, VRE 130 of system 100 uses Hidden 
15 Markov Model (HMM) techniques to perform voice recognition, although other 
voice recognition techniques can also be used within the scope of the present 
invention. It is also appreciated that the text data can be in any of a wide variety 
of formats. In an exemplary embodiment, the text data is generated in hypertext 
markup language (HTML) format. 

20 

Referring still to Figure 2A, in step 240, system 100 of Figure 1 stores the 
digital wave data and the text data as mixed data in memory sub-system 140 in a 
synchronized manner. More specifically, in one embodiment, steps 220 and 230 
are performed concurrently and the digital wave data and the text data generated 
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is sent to memory sub-system 140 via switch 135 in alternate fashion such that a 
particular portion of the digital wave data is correlated with the corresponding 
portion of the text data as they are being stored as mixed data. In an exemplary 
embodiment, the present invention employs a buffering mechanism in 
5 conjunction with switch 1 35 to handle timing delays that may arise during the 
voice recognition process (e.g., digital wave data is generated more quickly by 
HCE 120 than the corresponding text data is generated by VRE 130) to ensure 
that corresponding portions of digital wave data and text data is synchronized 
when it is stored in memory sub-system 140. 

10 

Referring next to Figure 2B, a diagram illustrating one embodiment of 
arrangement of corresponding portions of voice data and text data as stored in a 
portable digital recording and playback system 100 in accordance with the 
present invention is shown. In an exemplary embodiment as shown in Figure 

15 2B, a voice input is converted into portions 261, 262 and 263 of digital wave data 
and corresponding portions 271, 272 and 273 of text data. These portions of 
digital wave data and text data are then stored in memory sub-system 140 as 
mixed data such that respective portions of digital wave data and text data are 
synchronized. More specifically, in one embodiment, the data portions are 

20 stored in alternate fashion such that a particular portion of the digital wave data 
is correlated with the corresponding portion of the text data (e.g., text data 
portion 261 with digital wave data portion 271 ; text data portion 262 with digital 
wave data portion 272; text data portion 263 with digital wave data portion 273.) 
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As such, the present invention enables subsequent access and retrieval 
of the stored data to be performed efficiently and conveniently because the text 
data can be used to search for a desired portion of digital wave data, and vice 
versa, since the text data and digital wave data is synchronized. In one 
5 embodiment, switch 135 is controlled based on phonetic group definitions of the 
text in the text data. 

By performing real-time voice recognition on the voice input to generate 
text data, embodiments of the present invention eliminate the post-processing 

10 that is typically required in prior art systems in order to derive text data from 
stored voice data. Moreover, since the text data is generated directly from the 
voice input in the present invention and not from highly compressed voice data 
as in the prior art, high quality voice-to-text conversion is achieved. In addition, 
since the present invention does not rely on the stored voice data to generate the 

15 text data, the voice input can be subject to high compression and stored as 
digital wave data in accordance with the present invention to advantageously 
reduce the amount of memory required for storage without compromising the 
quality of the text data. 

20 With reference next to Figure 3, a flow diagram 300 illustrating steps for 

performing playback using system 100 of Figure 1 in accordance with one 
embodiment of the present invention is shown. In step 310, system 100 of 
Figure 1 retrieves the mixed data which comprises digital wave data and text 
data from memory sub-system 140. 
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In step 320, system 100 of Figure 1 decodes the digital wave data into 
audio data using decoder 150. In step 330, system 100 converts the audio data 
into an analog signal using D/A converter 160. In optional step 340, in one 
5 embodiment, system 100 amplifies the analog signal. In step 350, system 100 
generates a voice output corresponding to the voice input from the analog signal. 

It is appreciated that the present invention provides a high quality voice 
output. More specifically, the voice output is based on the recorded voice input 
10 (as digital wave data) and is a high fidelity reproduction thereof, and not based 
on a simulated voice generated using text data. 

With reference still to Figure 3, in step 360, system 100 of Figure 1 
displays the text data using display sub-system 180. More specifically, in one 

1 5 embodiment, the digital wave data and the text data retrieved is sent to decoder 
150 and display sub-system 180 via switch 145 in alternate fashion such that 
output of the digital wave data by audio output device 1 70 and display of the text 
data by display sub-system 180 is synchronized. As such, the present invention 
affords great convenience to the reviewer of the recorded voice and text. In one 

20 embodiment, switch 145 is controlled based on phonetic group definitions of the 
text in the text data. 

It is appreciated that embodiments of the present invention can operate 
for extended periods of time under battery power (e.g., disposable batteries, 
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rechargeable batteries) because components of system 100 (Figure 1) in 
accordance with the present invention do not consume power at a high rate. 
Thus, the present invention provides a digital recording and playback system 
which is operable under battery power and is portable and wherein high quality 
5 text data is generated from a voice input without requiring post-recording 
conversion. 

Moreover, it is appreciated that system 100 of Figure 1 in accordance with 
embodiments of the present invention does not require specialized circuit 

1 0 components or extensive retrofitting of existing computer systems and other 
appliances, because the circuit elements required for its implementation are 
commonly used in today's electronic appliances and are fully compatible with 
existing computer systems and other appliances. As such, a portable, battery- 
powered digital recording and playback system which does not require post- 

15 processing to generate high quality text data, and which is conducive to use with 
existing computer systems and other appliances is provided by the present 
invention. 

It is further appreciated that although exemplary values and operational 
20 details (e.g., compression ratio of HCE 120, voice recognition techniques used in 
VRE 130) for various components are given with respect to embodiments of the 
present invention described above, such values and details are illustrative only 
and can vary within the scope and spirit of the present invention. 

A 
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The preferred embodiment of the present invention, a digital recording 
and playback system with built-in voice recognition capability for concurrent text 
generation, is thus described. While the present invention has been described in 
particular embodiments, it should be appreciated that the present invention 
5 should not be construed as limited by such embodiments, but rather construed 
according to the below claims. 
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What is claimed is: 

5 \-. A d i g i tal record i ng and p l ayback system compr i s i ng: 

an aud i o captur i ng dev i c e conf i gur e d to rece i ve a voic e input; 

a h i gh compression e ncoder (HCE) coup l ed to said aud i o captur i ng 

dev i ce and configur e d to gen e rat e d i g i ta l wav e data correspond i ng to sa i d vo i c e 
input; 

10 a vo i co recogn i tion engin e (VRE) coup l ed to said audio capturing dev i ce 

and conf i gured to generate toxt data corresponding to sa i d voice input; 

a memory sub system se l ect i ve l y coup l ed to sa i d HCE and sa i d VRE and 

conf i gur e d to store said d i g i tal wave data and said toxt data; and 

where i n said HCE and sa i d VRE ar e op e rable to concurr e nt l y g e nerate 

15 sa i d d i g i ta l wav e data and sa i d t e xt data i n r e spons e to sa i d vo i c e input such that 
sa i d d i g i ta l wav e data and sa i d text data can bo stored i n a synchroniz e d 
mann e r. 

2r. Th e syst e m as r e c i t e d i n Cla i m 1 furth e r comprising a f i rst switch 

20 coupl e d b e tw ee n sa i d HCE and sa i d m e mory sub - system and a l so betw ee n sa i d 
VRE and said m e mory sub system, said first sw i tch conf i gured to coup l e one of 
sa i d HCE and said VRE to said m e mory sub system and to simu l taneous l y 
d o couplo the othor ono of sa i d HCE and sa i d VRE from said m e mory sub 
syst e m. 

25 
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3, Tho syst e m as rec i tod in Claim 1 further compr i s i ng: 

a d e cod e r s el ect i vely coup l ed to said memory sub system and configur e d 

to decode said d i g i ta l wave data i nto d e coded audio data; 

a d i g i tal to ana l og (D/A) converter coup le d to sa i d decod e r and configured 

5 to convert sa i d decod e d aud i o data into an ana l og s i gna l ; and 

an aud i o output dev i ce coupled to said D/A conv e rt e r and configured to 

rondor a vo i ce output corresponding to said voico input from sa i d analog s i gna l . 

4, Tho syst e m as rec i ted in C l aim 3 further compr i sing an amp l if i er 

10 coup l ed between said D/A converter and sa i d aud i o output d e v i c e and 
configur e d to amp li fy sa i d ana l og s i gna l . 

5. The system as rec i t e d in Cla i m 3 further compr i sing a disp l ay sub 

system se l ect i vely coup le d to said memory sub system and configured to disp l ay 
15 sa i d t e xt data. 

& Th e syst e m as r e c i t e d i n C l a i m 5 furth e r compris i ng a s e cond 

sw i tch coupl e d b e twe e n sa i d d e cod e r and sa i d m e mory sub - system and also 
b e twe e n said d i sp l ay sub system and sa i d m e mory sub system, sa i d second 
20 switch conf i gur e d to coup l e one of sa i d decoder and said d i sp l ay sub system to 
sa i d m e mory sub syst e m and to s i multan e ous l y d e coup le th e oth e r on e of said 
decoder and sa i d display sub system from s ai d m e mory sub system. 

7-. Th e system as r e c i t e d in Cla i m 5 wher ei n sa i d d i sp l ay sub syst e m 

25 compr i s e s a l i quid crystal d i sp l ay (LCD). 
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& Tho cyotom as recited i n C l a i m 1 whoro i n sa i d syst e m i s portab le 

and batt e ry - pow e r e d. 

9, Tho system as recited i n C l a i m 1 whoro i n sa i d m o mory sub system 

5 compr i s o s sem i conductor flash momory. 

1& Tho system as roc i tod i n C l a i m 1 where i n sa i d VRE uses Hidd e n 

Markov Mode l (HMM) techn i ques to perform voico recognit i on. 

10 14-: Tho system as r e c i t e d i n Claim 1 wh e r e in said HCE is operab le to 

ach i ovo a compress i on rat i o of two k il obits per s e cond (2 kbit/s). 

AQr. Th o system as roc i tod i n C l a i m 2 whoroin sa i d first sw i tch i s 

control l ed based on sa i d toxt data. 

15 

13, A m e thod for aud i o recording and p l ayback i n a portable dov i co, 

sa i d method compr i s i ng th o st o ps of: 
a) captur i ng a vo i c e i nput; 

b) performing high compr e ss i on e ncoding on sa i d vo i ce i nput to 

20 gen e rat e dig i ta l wav e data; 

c) p e rform i ng voic e r e cogn i t i on on sa i d vo i c e i nput to gen e rat e t e xt 

datai 

d) stor i ng sa i d d i g i ta l wavo data and sa i d toxt data i n sa i d portab l e 

d e v i c e ; and 

25 whoroin sa i d stops b) and c) are performed concurrent l y to gonorato said 

dig i tal wav o d a ta and sa i d toxt data i n response to sa i d voico input such that sa i d 
d i g i ta l wav e data and sa i d t e xt data can b e stor e d i n a synchron i z e d mann e r. 
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\A-. Tho m o thod ao roc i tod i n C l a i m 13 where i n said stop d) compr i ses 

the stop d1 ) of a l ternately stor i ng port i ons of sa i d d i g i ta l wav e data and 
correspond i ng port i ons of said toxt data such that sa i d dig i tal wave data and sa i d 
5 toxt data i s synchron i zed. 

Tho mothod as roc i tod i n C l a i m 13 further compr i s i ng tho steps of: 

e) retri e ving sa i d digital wav e data from said portab l e dev i c e ; 

f) d e cod i ng sa i d d i g i ta l wav e data i nto docodod aud i o data; 

10 §) converting said decod e d a ud i o data into an ana l og s i gna l ; and 

b) g e nerating a vo i ce output correspond i ng to said vo i ce input from 

sa i d analog s i gna l . 



V&-. Th o mothod as roc i tod i n Cla i m 1 5 further compr i s i ng tho stop of 

15 ampl i fy i ng sa i d ana l og s i gnal. 

Tho mothod as roc i tod in C l aim 15 further compris i ng th o st o ps of: 

i) rotr i oving sa i d text data from said portab l e d e v i ce; and 

j) d i sp l ay i ng sa i d toxt data. 

20 

4& Th e m e thod as r e c i ted i n C l a i m 1 7 wh e r ei n said st e p e ) compr i s e s 

th o st o p of r o tr io v i ng portions of sa i d digital wav e data from sa i d portable dev i ce 
and said stop i ) compr i s e s th o stop of rotr i ov i ng port i ons of said t o xt data 
corr e sponding to s a id port i ons of sa i d dig i tal wav e data from sa i d portab le 
25 device, and wher ei n sa i d stops o) and i ) aro performed a l ternat e ly such that sa i d 
rotr i oving of sa i d d i g i ta l wavo data and sa i d toxt data i s synchroniz e d. 
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1-9, Tho method as roc i tod i n C l aim 17 wherein sa i d stop j) compr i s e s 

tho stop of disp l ay i ng said toxt data on a li qu i d crysta l d i sp l ay (LCD). 

2& Tho m e thod as recit e d i n C l a i m 13 whoro i n sa i d portab l e dev i ce i s 

5 battory powered. 

2A-. Tho m e thod as r e c i t e d i n C l aim 13 wherein said step d) compr i sos 

tho stop of storing said d i g i tal wave data and sa i d t o xt data i n somiconductor 
f l ash m e mory w i th i n sa i d portab le d e v i c e . 

10 

2Qr. Th e m e thod as r e cit o d i n Claim 13 wh e r ei n sa i d step c) compr i s e s 

th o st o p of performing voic e r e cogn i tion on sa i d vo i ce input to gonorato toxt data 
using Hidden Markov Mod el (HMM) techn i ques. 

15 2& The m e thod as r e cit e d in C l a i m 13 wh e r e in sa i d high compr e ss i on 

e ncod i ng ach ie v e s a compr e ss i on ratio of two k il ob i ts p e r s e cond (2 kb i t/s). 

— 24, Th e m e thod as r e c i t e d i n C l a i m 1 4 wh e r e in sa i d st e p d1) i s 

control l ed bas e d on sa i d toxt data. 

20 

2& A d i gita l r e cording and playback syst e m compr i s i ng: 

an aud i o captur i ng moans for r e c ei v i ng a voice i nput; 

a h i gh compress i on encoding m e ans coup l ed to sa i d audio captur i ng 

m e ans for gen e rating d i gita l wavo data corr e spond i ng to sa i d voice input; 

25 a vo i c e r e cognit i on m e ans coup le d to said audio captur i ng m e ans for 

g e n e rat i ng t e xt data corr e spond i ng to sa i d vo i c e input; and 
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a storage means s ele ct i v el y coup l ed to sa i d high comprossion e ncod i ng 

moans and sa i d vo i co rocogn i tion moans for stor i ng sa i d dig i ta l wav e data a nd 
sa i d toxt data, whoro i n sa i d h i gh compr e ssion e ncod i ng moans and said voic o 
rocognition moans are op e rab l e to concurr e ntly gonorate sa i d d i gita l wavo data 
5 and sa i d toxt data i n rosponso to sa i d voico i nput such that said d i g i tal wavo data 
and sa i d t o xt data can b e stored i n a synchron i zed manner. 

2& Tho system as rocitod i n C l a i m 25 further compris i ng a f i rst 

switch i ng m o ans coup le d b e tween sa i d h i gh compr o ssion e ncoding moans and 
10 said storage moans and also botwoon sa i d vo i co rocogn i tion m e ans and sa i d 
storage moans, sa i d first sw i tch i ng means for coup l ing one of sa i d h i gh 
compr e ss i on encoding m e ans and sa i d vo i c e r e cognition m e ans to sa i d storag e 
m o ans whi l e simultaneous l y decoup li ng tho othor on e of sa i d h i gh compression 
e ncod i ng moans and said vo i co r e cognit i on moans from sa i d storage moans. 

15 

27, Tho systom as r o c i tod i n C l a i m 25 furth e r compr i s i ng: 

a decod i ng moans se l octivo l y coupled to said storage moans for decoding 

sa i d d i g i tal wav e data i nto d e cod e d aud i o data; 

a d i gital - to analog (D/A) conv e rting m e ans coup l od to sa i d decod i ng 

20 moans for convert i ng sa i d d o codod audio data into an analog signal; and 

an aud i o output m e ans coup le d to sa i d D/A conv e rting moans for 

generat i ng a vo i c e output corr e spond i ng to sa i d vo i c e i nput from said analog 
signa l . 

25 2& Tho system as roc i tod in Cla i m 27 further compr i sing an amplifying 

m e ans coup l od botwoon said D/A convert i ng means and said aud i o output 
m e ans for amp li fying sa i d ana l og s i gna l . 
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2ft Th e syst e m as roc i t e d i n C l a i m 27 furth e r compr i sing a d i splay 

m o ans soloctivo l y couplod to sa i d storage moans for d i sp l ay i ng said toxt data. 

5 3ft Th o syst e m as rocited in C l a i m 29 furth e r compr i sing a second 

sw i tch i ng moans coup l od betw ee n said decod i ng moans and said storag e 
moans and a l so between sa i d d i sp l ay m o ans and said storage moans, said 
s e cond switch for coup li ng on e of sa i d d e coding m e ans and sa i d d i sp l ay m e ans 
to said storage moans wh ile s i mu l tan e ous l y d e coup li ng th e oth e r on e of sa i d 
10 d e cod i ng moans and said d i splay moans from sa i d storage moans. 

63. A recording and playback system comprising: 

an audio capturing device configured to receive an analog input; 

an encoder coupled to said audio capturing device and configured to 

15 generate a digital signal based on said analog input; and 

a recognition engine coupled to said audio capturing device and 

configured to generate text data based on said analog input, wherein said 
encoder and said recognition engine simultaneously generate said digital signal 
and said text data such that said digital signal and said text data can be provided 

20 in a synchronized manner. 

64. The system as recited in Claim 63 further comprising a first switch 

coupled between said encoder and a memory sub-system and also between said 
recognition engine and said memory sub-svstem, said first switch configured to 
25 couple one of said encoder and said recognition engine to said memory sub- 
svstem and to simultaneously decouple the other one of said encoder and said 
recognition engine from said memory sub-svstem. 
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65. The system as recited in Claim 64 further comprising: 

a decoder selectively coupled to said memory sub-system and configured 

to decode said digital signal into decoded audio data; 

5 a digital-to-analog (D/A) converter coupled to said decoder and configured 

to convert said decoded audio data into an analog signal; and 

an audio output device coupled to said D/A converter and configured to 

render a voice output corresponding to said analog input from said analog signal. 

10 66. The system as recited in Claim 65 further comprising an amplifier 

coupled between said D/A converter and said audio output device and 
configured to amplify said analog signal. 

67. The system as recited in Claim 65 further comprising a display sub- 

15 system selectively coupled to said memory sub-system and configured to display 
said text data. 

68. The system as recited in Claim 67 further comprising a second 

switch coupled between said decoder and said memory sub-system and also 
20 between said display sub-system and said memory sub-system, said second 
switch configured to couple one of said decoder and said display sub-system to 
said memory sub-system and to simultaneously decouple the other one of said 
decoder and said display sub-system from said memory sub-system. 

25 69. The system as recited in Claim 67 wherein said display sub-system 

comprises a liguid crystal display (LCD). 
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70. The system as recited in Claim 63 wherein said system is portable 
and battery-powered. 

71 . The system as recited in Claim 64 wherein said memory sub- 

5 system comprises semiconductor flash memory. 

72. The system as recited in Claim 63 wherein said recognition engine 

uses Hidden Markov Model (HMM) technigues to perform recognition. 

10 73. The system as recited in Claim 63 wherein said encoder is 

operable to achieve a rate of two kilobits per second (2 kbit/s). 

74. The system as recited in Claim 64 wherein said first switch is 

controlled based on said text data. 

15 

75. A method for audio recording and playback in a portable device, 

said method comprising the steps of: 
a) capturing a first analog signal; 

b) encoding said first analog signal to generate a digital signal; 

20 c) performing recognition on said analog signal to generate text data; 

wherein said b) and c) are performed simultaneously to generate said 

digital signal and said text data in response to said first analog signal such that 
said digital signal and said text data can be stored in a synchronized manner in a 
memory device. 

76. The method as recited in Claim 75 further comprising: 
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alternately storing portions of said digital signal and corresponding 

portions of said text data such that said digital signal and said text data is 
synchronized. 

5 77. The method as recited in Claim 75 further comprising the steps of: 

d) decoding said digital signal into decoded audio data; 

e) converting said decoded audio data into a second analog signal; 

and 

f) generating a voice output corresponding to said first analog signal 

10 from said second analog signal. 

78. The method as recited in Claim 77 further comprising the step of 

amplifying said second analog signal. 

15 79. The method as recited in Claim 77 further comprising the steps of: 

g) displaying said text data on a display device coupled to said 

portable device. 

80. The method as recited in Claim 79 wherein said display device is a 

20 liquid crystal display (LCD). 

81 . The method as recited in Claim 75 wherein said portable device is 

battery-powered. 

25 82. (New) The method as recited in Claim 75 wherein said memory 

device is a flash memory coupled to said portable device. 
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83. (New) The method as recited in Claim 75 wherein a Hidden Markov 
Model (HMM) technique is used for said recognition. 

84. (New) The method as recited in Claim 75 wherein said encoding is 
5 performed at a rate of substantially two kilobits per second (2 kbit/s). 
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DIGITAL RECORDING AND PLAYBACK SYSTEM 
WITH VOICE RECOGNITION CAPABILITY 
FOR CONCURRENT TEXT GENERATION 



5 ABSTRACT OF THE INVENTION 

A recording and playback system is provided. The system includes an 
audio capturing device configured to receive an analog input and an encoder 
coupled to the audio capturing device configured to generate a digital signal 
based on the analog input. The system further includes a recognition engine 
10 coupled to the audio capturing device and configured to generate text data 
based on the analog input, wherein the encoder and the recognition engine 
simultaneously generate the digital signal and the text data such that the digital 
signal and the text data can be provided in a synchronized manner. 

15 A dig i tal recording and playback syst e m with bu il t i n vo i co recognit i on 

capab ili ty for concurr e nt text g e n e ration. I n ono embod i m e nt, th e syst e m 
compris e s an aud i o captur i ng dev i ce configur e d to r e c e iv e a vo i c o i nput. The 
systom also comprises a h i gh compr e ssion e ncod e r (HCE) coup lo d to the aud i o 
captur i ng d o v i co and conf i gured to g e nerat e d i g i ta l wav e data correspond i ng to 

20 th e voic e i nput, as we ll as a vo i c e r e cognit i on e ng i n e (VRE) coup le d to th e aud i o 
captur i ng d e vic e and configur e d to g e n e rat e t e xt data corr e spond i ng to th e vo i c e 
i nput. I n this embod i ment, the HCE and VRE are so l oct i voly couplod to a 
m e mory sub syst e m which i s configured to stor e th e d i gital wave data and the 
t e xt data. I n th i s e mbod i m e nt, th e VRE p e rforms vo i c e to t e xt convers i on us i ng 

25 the h i gh qua li ty audio i nput s i gnal rath o r than h i gh l y compressed voice data so 
that h i gh qua li ty conv e rsion i s ach i eved. I n this embod i m e nt, th e HCE and the 
VRE ar e op e rab le to concurr e ntly generat e th e dig i ta l wavo data and th e t e xt 
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data i n r e sponse to tho vo i ce i nput such that tho d i g i ta l wave data and tho text 
data can be stored i n th e m e mory sub - syst e m i n a synchroniz e d mann e r. As 
such, this ombod i mont of tho prosont invention prov i des record i ng capab ili ty 
whor o in text data i s g e n e rat e d from a voico i nput w i thout r e qu i r i ng post r e cord i ng 
5 conv e rs i on. In a sp e c i f i c e mbodiment, th e pr e s e nt invent i on i nc l ud e s th e abov e 
and wh e r e in the syst e m i s batt e ry - pow e red and i s portab le . 
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